A Fast Algorithm for Making Su x Arrays and for Burrows-Wheeler Transformation
نویسنده
چکیده
We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desired. We compare algorithms for making su x arrays of Bentley-Sedgewick, Andersson-Nilsson and Karp-Miller-Rosenberg and making su x trees of Larsson on speed and required memory and propose a new algorithm which is fast and memory e cient by combining them. We also de ne a measure of di culty of sorting su xes: average match length. Our algorithm is e ective when the average match length of a text is large, especially for large databases.
منابع مشابه
A Fast Algorithms for Making Suffix Arrays and for Burrows-Wheeler Transformation
We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...
متن کاملConstructing Su x Arrays of Large Texts
Recently, Sadakane [12] proposes a new fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called sufx array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sor...
متن کاملA Cooperative Distributed Text Database Management Method Unifying Search and Compression Based on the Burrows-Wheeler Transformation
A new text database management method for distributed cooperative environments is proposed, which can collect texts in distributed sites through a network of narrow bandwidth and enables fulltext search in a uni ed e cient manner. This method is based on the two new developments in full-text search data structures and data compression. Speci cally, the Burrows-Wheeler transformation is used as ...
متن کاملApproximate Pattern Matching Over the Burrows-Wheeler Transformed Text
The compressed pattern matching problem is to locate the occurrence(s) of a pattern P in a text string T using a compressed representation of T , with minimal (or no) decompression. In this paper, we consider approximate pattern matching directly on Burrow-Wheeler transformed (BWT) text which is a critical step for a fully compressed pattern matching algorithm on a BWT based compression algorit...
متن کاملEfficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT)
MOTIVATION Over the last few years, methods based on suffix arrays using the Burrows-Wheeler Transform have been widely used for DNA sequence read matching and assembly. These provide very fast search algorithms, linear in the search pattern size, on a highly compressible representation of the dataset being searched. Meanwhile, algorithmic development for genotype data has concentrated on stati...
متن کامل